11 research outputs found

    Zivilgesellschaft, Gemeinwohl und Kollektivgüter

    Get PDF
    In den letzten zwei Jahrzehnten erlebten ‚Gemeinwohl’ wie ‚Zivilgesellschaft’ eine Renaissance als Topoi und wissenschaftliche Konzepte. Beide werden oft miteinander in Beziehung gebracht: Während der Zivilgesellschaft eine gemeinwohlfördernde Funktion zugeschrieben wird, dient das Gemeinwohl wiederum auch in wissenschaftlich-analytischen Konzepten der Zivilgesellschaft als Kennzeichen eines spezifisch ‚zivilgesellschaftlichen’ Handlungsmodus. Diese Gleichsetzung wird im vorliegenden Papier theoretisch wie empirisch anhand ausgewählter Fallstudien zum Gemeinwohldiskurs in Umweltkonflikten in Frage gestellt. Die vorgestellten Überlegungen beziehen sich dabei primär auf eine von Jürgen Kocka im Rahmen der Arbeitsgruppe ‚Zivilgesellschaft’ am WZB entwickelte Definition von Zivilgesellschaft. Der Bezug auf das Gemeinwohl wird insgesamt als nur begrenzt geeignet angesehen, zivilgesellschaftliches Handeln zu kennzeichnen. Dies nicht nur aufgrund seiner inhaltlichen Vagheit und normativen Aufladung, sondern auch deshalb, weil das Gemeinwohl in Nutzungskonflikten um Kollektivgüter als stehender Begriff verwendet wird. Die Frage, ob das jeweils umstrittene Gut die Eigenschaften eines privaten oder eines öffentlichen Gutes aufweist, bestimmt damit maßgeblich die diskursive Verwendung von ‚Gemeinwohl’ mit. Da zudem zivilgesellschaftliche Akteure ohnehin meist im Falle von Verteilungsfragen, die Kollektivgüter betreffen, in die öffentliche Debatte eingreifen, bietet es sich an, anstelle des Gemeinwohls die Verhandlung von Kollektivgütern in eine wissenschaftliche Konzeptualisierung zivilgesellschaftlichen Handelns aufzunehmen.As topoi and as scientific concepts, ‘common welfare’ (Gemeinwohl) and ‘civil society’ experienced a revitalisation within the last two decades. Both are often used in context of one another. While civil society is supposed to enhance the common welfare, the latter is, in return, perceived as characteristic for activities typical for civil society, even in scientific concepts. In this paper, this relationship is put into question, theoretically as well as empirically, taking into account selected case studies on the debate on ‘common welfare’ in environmental conflicts. It specifically refers to a concept of civil society developed by Jürgen Kocka within the WZB Working Group “Civil Society: historical and comparative perspectives”. It will be shown that reference to common welfare is only suitable to a limited extent in characterising ‘civil society’ activities. This is not only due to its vagueness and normative charge; moreover, the ‘common welfare’ seems to be a set phrase in the case of conflicts regarding collective goods. Whether the goods in question can be described as private or public goods apparently influences the discursive use of ‘common welfare’. Since civil society actors mostly tend to intervene in public debates concerning the distribution of collective goods, the negotiation of ‘collective goods’ rather than ‘common welfare’ should be integrated into a scientific conceptualisation of civil society

    Large Matrix Multiplication on a Novel Heterogeneous

    Get PDF
    Abstract. This paper introduces a novel master-multi-SIMD on-chip multi-core architecture for embedded signal processing. The parallel architecture and its memory subsystem are described in this paper. We evaluate the large size matrix multiplication performance on this parallel architecture and compare it with a SIMD-extended data parallel architecture. We also examine how well the new architecture scales for di«erent numbers of SIMD co-processors. The experimental results show that the ePUMA 1 architecture's memory subsystem can e«ectively hide the data access overhead. With its 8-way SIMD data path and multi-SIMD parallel execution, the ePUMA architecture improves the performance of matrix multiplication with a speedup of 45x from the conventional SIMD extension

    Evaluation an der FH Bund

    Get PDF
    The host-SIMD style heterogeneous multi-processor architecture offers high computing performance and user friendly programmability. It explores both task level parallelism and data level parallelism by the on-chip multiple SIMD coprocessors. For embedded DSP applications with predictable computing feature, this architecture can be further optimized for performance, implementation cost and power consumption. The optimization could be done by improving the SIMD processing efficiency and reducing redundant memory accesses and data shuffle operations. This paper introduces one effective approach by designing a software programmable multi-bank memory system for SIMD processors. Both the hardware architecture and software programming model are described in this paper, with an implementation example of the BLAS syrk routine. The proposed memory system offers high SIMD data access flexibility by using lookup table based address generators, and applying data permutations on both DMA controller interface and SIMD data access. The evaluation results show that the SIMD processor with this memory system can achieve high execution efficiency, with only 10% to 30% overhead. The proposed memory system also saves the implementation cost on SIMD local registers, in our system, each SIMD core has only 8 128-bit vector registers.ePUM

    A Scalable Run-Time System for NestStep on Cluster Supercomputers

    No full text
    NestStep is a collection of parallel extensions to existing programming languages. These extensions supports a shared memory model and nested parallelism. NestStep is based the Bulk-Synchronous Programming model. Most of the communication of data in NestStep takes place in a combine/commit phase, which is essentially a reduction followed by a broadcast. The primary aim of the project that this thesis is based on was to develop a runtime system for NestStep-C, the extensions for the C programming language. The secondary aim was to find which tree structure among a selected few is the best for communicating data in the combine/commit phase. This thesis includes information about NestStep, how to interface with the NestStep runtime system, some example applications and benchmarks for determining the best tree structure. A binomial tree structure and trees similar to it was empirically found to yield the best performance

    Efficient Compilation for Application Specific Instruction set DSP Processors with Multi-bank Memories

    No full text
    Modern signal processing systems require more and more processing capacity as times goes on. Previously, large increases in speed and power efficiency have come from process technology improvements. However, lately the gain from process improvements have been greatly reduced. Currently, the way forward for high-performance systems is to use specialized hardware and/or parallel designs. Application Specific Integrated Circuits (ASICs) have long been used to accelerate the processing of tasks that are too computationally heavy for more general processors. The problem with ASICs is that they are costly to develop and verify, and the product life time can be limited with newer standards. Since they are very specific the applicable domain is very narrow. More general processors are more flexible and can easily adapt to perform the functions of ASIC based designs. However, the generality comes with a performance cost that renders general designs unusable for some tasks. The question then becomes, how general can a processor be while still being power efficient and fast enough for some particular domain? Application Specific Instruction set Processors (ASIPs) are processors that target a specific application domain, and can offer enough performance  with power efficiency and silicon cost that is comparable to ASICs. The flexibility allows for the same hardware design to be used over several system designs, and also for multiple functions in the same system, if some functions are not used simultaneously. One problem with ASIPs is that they are more difficult to program than a general purpose processor, given that we want efficient software. Utilizing all of the features that give an ASIP its performance advantage can be difficult at times, and new tools and methods for programming them are needed. This thesis will present ePUMA (embedded Parallel DSP platform with Unique Memory Access), an ASIP architecture that targets algorithms with predictable data access. These kinds of algorithms are very common in e.g. baseband processing or multimedia applications. The primary focus will be on the specific features of ePUMA that are utilized to achieve high performance, and how it is possible to automatically utilize them using tools. The most significant features include data permutation for conflict-free data access, and utilization of address generation features for overhead free code execution. This sometimes requires specific information; for example the exact sequences of addresses in memory that are accessed, or that some operations may be performed in parallel. This is not always available when writing code using the traditional way with traditional languages, e.g. C, as extracting this information is still a very active research topic. In the near future at least, the way that software is written needs to change to exploit all hardware features, but in many cases in a positive way. Often the problem with current methods is that code is overly specific, and that a more general abstractions are actually easier to generate code from

    Efficient Compilation for Application Specific Instruction set DSP Processors with Multi-bank Memories

    No full text
    Modern signal processing systems require more and more processing capacity as times goes on. Previously, large increases in speed and power efficiency have come from process technology improvements. However, lately the gain from process improvements have been greatly reduced. Currently, the way forward for high-performance systems is to use specialized hardware and/or parallel designs. Application Specific Integrated Circuits (ASICs) have long been used to accelerate the processing of tasks that are too computationally heavy for more general processors. The problem with ASICs is that they are costly to develop and verify, and the product life time can be limited with newer standards. Since they are very specific the applicable domain is very narrow. More general processors are more flexible and can easily adapt to perform the functions of ASIC based designs. However, the generality comes with a performance cost that renders general designs unusable for some tasks. The question then becomes, how general can a processor be while still being power efficient and fast enough for some particular domain? Application Specific Instruction set Processors (ASIPs) are processors that target a specific application domain, and can offer enough performance  with power efficiency and silicon cost that is comparable to ASICs. The flexibility allows for the same hardware design to be used over several system designs, and also for multiple functions in the same system, if some functions are not used simultaneously. One problem with ASIPs is that they are more difficult to program than a general purpose processor, given that we want efficient software. Utilizing all of the features that give an ASIP its performance advantage can be difficult at times, and new tools and methods for programming them are needed. This thesis will present ePUMA (embedded Parallel DSP platform with Unique Memory Access), an ASIP architecture that targets algorithms with predictable data access. These kinds of algorithms are very common in e.g. baseband processing or multimedia applications. The primary focus will be on the specific features of ePUMA that are utilized to achieve high performance, and how it is possible to automatically utilize them using tools. The most significant features include data permutation for conflict-free data access, and utilization of address generation features for overhead free code execution. This sometimes requires specific information; for example the exact sequences of addresses in memory that are accessed, or that some operations may be performed in parallel. This is not always available when writing code using the traditional way with traditional languages, e.g. C, as extracting this information is still a very active research topic. In the near future at least, the way that software is written needs to change to exploit all hardware features, but in many cases in a positive way. Often the problem with current methods is that code is overly specific, and that a more general abstractions are actually easier to generate code from

    Architectural Support for Reducing Parallel Processing Overhead in an Embedded Multiprocessor

    No full text
    The host-multi-SIMD chip multiprocessor (CMP) architecture has been proved to be an efficient architecture for high performance signal processing which explores both task level parallelism by multi-core processing and data level parallelism by SIMD processors. Different from the cache-based memory subsystem in most general purpose processors, this architecture uses on-chip scratchpad memory (SPM) as processor local data buffer and allows software to explicitly control the data movements in the memory hierarchy. This SPM-based solution is more efficient for predictable signal processing in embedded systems where data access patterns are known at design time. The predictable performance is especially important for real time signal processing. According to Amdahl¡¯s law, the nonparallelizable part of an algorithm has critical impact on the overall performance. Implementing an algorithm in a parallel platform usually produces control and communication overhead which is not parallelizable. This paper presents the architectural support in an embedded multiprocessor platform to maximally reduce the parallel processing overhead. The effectiveness of these architecture designs in boosting parallel performance is evaluated by an implementation example of 64x64 complex matrix multiplication. The result shows that the parallel processing overhead is reduced from 369% to 28%.ePUM

    ePUMA: a novel embedded parallel DSP platform for predictable computing

    No full text
    In this paper, a novel parallel DSP platform based on master-multi-SIMD architecture is introduced. The platform is named ePUMA [1]. The essential technology is to use separated data access kernels and algorithm kernels to minimize the communication overhead of parallel processing by running the two types of kernels in parallel. ePUMA platform is optimized for predictable computing. The memory subsystem design that relies on regular and predictable memory accesses can dramatically improve the performance according to benchmarking results. As a scalable parallel platform, the chip area is estimated for different number of co-processors. The aim of ePUMA parallel platform is to achieve low power high performance embedded parallel computing with low silicon cost for communications and similar signal processing applications.©2010 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.Jian Wang, Joar Sohl, Olof Kraigher and Dake Liu, ePUMA: a novel embedded parallel DSP platform for predictable computing, 2010, International Conference on Information and Electronics Engineering, (5), 32-35.http://dx.doi.org/10.1109/ICETC.2010.5529952ePUM

    Software programmable data allocation in multi-bank memory of SIMD processors

    No full text
    The host-SIMD style heterogeneous multi-processor architecture offers high computing performance and user friendly programmability. It explores both task level parallelism and data level parallelism by the on-chip multiple SIMD coprocessors. For embedded DSP applications with predictable computing feature, this architecture can be further optimized for performance, implementation cost and power consumption. The optimization could be done by improving the SIMD processing efficiency and reducing redundant memory accesses and data shuffle operations. This paper introduces one effective approach by designing a software programmable multi-bank memory system for SIMD processors. Both the hardware architecture and software programming model are described in this paper, with an implementation example of the BLAS syrk routine. The proposed memory system offers high SIMD data access flexibility by using lookup table based address generators, and applying data permutations on both DMA controller interface and SIMD data access. The evaluation results show that the SIMD processor with this memory system can achieve high execution efficiency, with only 10% to 30% overhead. The proposed memory system also saves the implementation cost on SIMD local registers, in our system, each SIMD core has only 8 128-bit vector registers.ePUM
    corecore